Database backup: insufficient resources for cache withdrawal

Comments 0

Share to social media

Sometimes SQL Server can pose conundrums of such complexity that even the Riddler couldn’t conceive of. In my mind’s eye, a SQL Server is in fact its’ own operating system, running on another operating system. For instance, SQL Server has a task scheduler to manage threads running within the SQL Server Process, and this User-Mode Scheduler marches to a different drum than the one in Windows, the most significant being that the multitasking is done cooperatively. Most of the time, anyway. SQL Server also does memory management of a sort, and a fixed memory area for running third-party extensions (extended stored procedures). It has processes for committing I/O in the form of the lazy writer, periodically flushing SQL pages from memory onto disk. This makes SQL a bit hard to troubleshoot in that you don’t know whether to look for configuration problems in SQL or a misbehaving installation of Windows.

Riddle me this, Batpersons: when does a disk with 100 gigabytes of free space tell you it’s full?

When the context of the question involves a failing SQL Server backup job, I’d like to contort myself into a shipping crate, have someone address it to Brazil, cover it with postage stamps, and stuff it into the nearest mailbox. Once the initial panic passes, though, I can grab a chair in front of the bat-computer and get to work.

To rule Windows out as the culprit, we can copy a file of a similar size to the disk, perhaps by grabbing a previous backup of the database and copying it to the target disk. I can think offhand of a few situations where this would fail:

The disk may be formatted as FAT32, imposing a 4GB limit on the size of file. Not very likely these days. Second, there could be a disk quota exceeded for the user copying the file. Even if the copy succeeds, you may want to open a console and use RunAs /username:x (where x is the user who normally runs the backup) and copy the file from the command prompt.

These are relatively easy to fix. Then, there are the weird problems that require some deeper investigation. Locked files are one possibility, and we could use OH.EXE (open handles, from the Windows resource kit) to see what process is locking the backup file. The problem here is that we need to be sitting in front of the box as the job fails: as backup problems tend to be sporadic and happen when we are in bed counting sheep, so this isn’t always practical.

In one case, a remote backup to a share failed because the Windows Remote File Cache got corrupted somehow, which is impossible to diagnose but straightforward to repair (read REINSTALL WINDOWS!).

Third-party SQL Server backup solutions (Veritas, LiteSpeed, Legato, SQL Backup) that rely on Virtual Device Interface (VDI) introduce potential memory allocation problems that masquerade as disk I/O problems. If the backup tool cannot allocate the requested buffer space from VDI, then SQL Error 112 (out of disk space) errors occur. SQL Server reserves 128MB of memory for these buffers, which is shared with extended stored procedures and COM objects created using sp_OACreate. If extended stored procedures are leaking memory or COM objects are not freed with sp_OADestroy, this memory will continue to shrink. Running

DBCC MEMORYSTATUS

on the server may be revealing in this case.

Memory fragmentation is also a potential issue if DBCC still reports oodles of free memory. If the VDI-based backup job succeeds after a SQL Server restart, then one of these issues could be the smoking gun. An improperly configured SQL Server OS could also be to blame. Certain startup options for Windows and SQL Server may be worth looking into depending on the size of physical memory:

– If you have 3-4 GB memory, include the /3GB switch in the Windows startup.
– If you have 4-8GB memory, use /3GB and /PAE
– If you have 16 GB or more, use only /PAE, as /3GB will cripple memory over 8GB.

-When you have set /PAE, go into SQL Server’s configuration and set the option to use AWE to ON. If you do this, however, you need to also specify a maximum memory value in SQL Server, if you do not, then SQL Server will take all but 128MB of the computer’s memory if the automatic memory management is used in SQL Server.

-The user who runs the SQL Server needs to have the ‘lock pages in memory’ user right in the local security policy, or it will have problems allocating in the large memory.

-There is a -g startup option in SQL Server to control how much memory SQL Server will leave free for extended stored procedure code. This is important to do when you have more than 500 databases on a server: each database that’s online will use 64Kb of the free memory. Microsoft recommends:

500 databases: -g288
1000 databases -g372

After setting -g, you need to restart SQL Server. Note that AWE is not available on all editions of SQL Server. If you have hundreds of databases on a SQL Server Standard, think about upgrading to Enterprise or get another instance of SQL to hold the databases.

If the cause of failure cannot be determined, most third-party backup programs (as well as SQL Server) include MaxTransferSize and BufferCount parameters to make it possible to control the size of the buffers manually.

Hopefully, this information will help sort out various native SQL Server and third-party database backup problems caused by resource issues, and ensure a successful backup tomorrow, same bat-time, same bat-server.

Load comments

About the author

Brian Donahue's contributions